Goto

Collaborating Authors

 mixture regularization


Improving Generalization in Reinforcement Learning with Mixture Regularization

Neural Information Processing Systems

Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g.


Review for NeurIPS paper: Improving Generalization in Reinforcement Learning with Mixture Regularization

Neural Information Processing Systems

Additional Feedback: Although I believe the arguments for mixup style regularization make sense, I do have some concerns about potential bias from the ProcGen benchmark. Many of the games in ProcGen are 2D games with a fixed camera (a skim of videos in the envs gives 8 of 16 envs have a fixed cameras and 7 of those 8 have a static image background.) We would expect a mixup style method to do better on these environments, because averaging 2 images together naturally exposes what parts of the image are static, and what parts of the image are not. So I have some concerns over how well this will generalize to other settings. Based on the training curves, mixup is simply more efficient than PPO on the train-time environments.


Review for NeurIPS paper: Improving Generalization in Reinforcement Learning with Mixture Regularization

Neural Information Processing Systems

This submission was generally understood by reviewers to be a straightforward extension of existing work on supervised learning regularization, thus presenting limited technical novelty. It was reasonably well executed from an experimental perspective and potentially high impact given the strength of the results. In discussion, reviewers debated the merits of the paper, with several arguing that for such a limited algorithmic contribution the analysis component needed to be stronger. R3 would have liked to see broader empirical assessment, a greater discussion and interrogation of limitations, and whether combination with other forms of data augmentation yielded additive gains, while R1 felt that evaluation on strictly image-based environments was potentially misleading. I concur with several of these criticisms, but must balance the paper's shortcomings with the value to the community in highlighting a method which is a very clear target for further research, and an already potentially useful entry in a practitioner's toolbox.


Improving Generalization in Reinforcement Learning with Mixture Regularization

Neural Information Processing Systems

Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. However, we find these approaches only locally perturb the observations regardless of the training environments, showing limited effectiveness on enhancing the data diversity and the generalization performance. In this work, we introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision (e.g. Mixreg increases the data diversity more effectively and helps learn smoother policies.